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Abstract — Although wireless sensor networks (WSNs) are pow- 
erful in monitoring physical events, the data collected from a 
WSN are almost always incomplete if the surveyed physical event 
spreads over a wide area. The reason for this incompleteness is 
twofold: i) insufficient network coverage and ii) data aggregation 
for energy saving. Whereas the existing recovery schemes only 
tackle the second aspect, we develop Dual-lEvel Compressed 
Aggregation (DECA) as a novel framework to address both 
aspects. Specifically, DECA allows a high fideUty recovery of 
a widespread event, under the situations that the WSN only 
sparsely covers the event area and that an in-network data 
aggregation is applied for traffic reduction. Exploiting both the 
low-rank nature of real-world events and the redundancy in 
sensory data, DECA combines matrix completion with a fine- 
tuned compressed sensing technique to conduct a dual-level 
reconstruction process. We demonstrate that DECA can recover 
a widespread event with less than 5% of the data, with respect 
to the dimension of the event, being collected. Performance 
evaluation based on both synthetic and real data sets confirms the 
recovery fidelity and energy efficiency of our DECA framework. 

Index Terms — Compressed Sensing, Wireless Sensor Networks, 
Data Aggregation, Diffusion Wavelets, Matrix Completion 

I. Introduction 

As wireless sensor networks (WSNs), with their networked 
sensors, have the ability of "merging" into physical envi- 
ronments, they are generally considered as powerful tools to 
survey or monitor physical events. Several real systems have 
been emerged, including FireFly 1 1 1 that tracks the position 
of miners, GreenOrbs |2| that collects ecological information 
from a forest, and many others. We may roughly categorize the 
events subject to WSNs' surveillance into two types. On one 
hand, burst events take place only sporadically, and monitoring 
such events often boils down to detecting abnormal changes 
in an area. For example, a sudden temperature change in a 
warehouse may signal a fire alarm. On the other hand, a 
field (of certain physical quantities, e.g., humidity or pollution 
level) has a smooth distribution over a wide area and usually 
undergoes gradual changes. We illustrate these two types of 
events in Fig. [T] 

Whereas monitoring burst events may only require intermit- 
tent data transmissions across a WSN to report abrupt changes, 
surveying a field does demand a constant data gathering from 
a large-scale WSN that is meant to sufficiently cover the 




(a) Burst events (b) Field 

Fig. 1. Two types of physical events. 



monitored area. Obviously, in providing high fidelity field 
surveillance, the energy efficiency issue of WSNs becomes 
a bottleneck. In this paper, we aim at tackling the conflict be- 
tween sensory data fidelity and energy efficient data gathering 
for WSNs that perform field surveillance. 



A. Problem Overview and Motivations 

Energy efficient data gathering in WSNs has been a long 
standing topic since the inception of these networked sensing 
systems. While the approaches involving routing or scheduling 
focus on improving the efficiency of data transportation |3|, 
im, IS, data aggregatiot^ directly fights the redundancy in 
sensory data, striving to significantly reduce the amount of 
data to be transported (e.g., f6l, |T|, fSl). Consequently, data 
aggregation is often deemed as a crucial mechanism to achieve 
energy efficient data gathering for WSNs. 

1) Conventional Data Aggregation: In general, data ag- 
gregation can be either lossy or lossless. Lossy aggregation 
usually adopts simple aggregation functions (e.g., MIN, MAX, 
or SUM) and only extracts certain features from the sensory 
data (e.g., |6|). Obviously, though this approach may improve 
the efficiency of monitoring burst events, it is definitely not 
suitable for field surveillance, as, apart from a few features, 
most of the information about a field is lost and is beyond 



We use the teim "data aggregation" in a broad sense: it refers to any 
in-network traffic reduction mechanism. 



recoverability. Lossless aggregatiotrjis closely related to data 
compression: they both aim at "squeezing" the redundancy 
or insignificant components of a given data set to reduce its 
volume Q, m. However, unlike common data compression 
(where the underlying statistic model is known a priori or 
can be easily discovered), the model that describes data 
correlations for a sensing field is often unknown or may vary 
in time. As a result, distributed source coding techniques [7)| 
using, for example, Slepian- Wolf coding are not exactly practi- 
cal. Moreover, collaboratively discovering the data correlation 
structure IS) leads to high communication load that offsets the 
benefit of this aggregation technique. 

If we consider the field under WSN surveillance as an 
"image", image compression techniques (e.g., DCT or wavelet 
based) appear to be a good way to realize lossless aggregation 
121. Unfortunately, this approach is facing several major diffi- 
culties. Firstly, unless sensors are deployed to monitor every 
"pixel" of a field, the sensory data are not amenable to a 2D 
transformation. However, WSNs can be randomly deployed in 
order to avoid high cost at the deployment phase. Secondly, 
even if 2D transformations can be applied to a regularly 
deployed WSN, applying such transformations in-network can 
bring high overhead, due to the need of exchanging coefficients 
among coding nodes. Finally, given the difficulty in using 
2D transformations, taking into account higher dimensional 
correlation of the field data (e.g., the temporal correlation) 
becomes almost impossible. 

2) Compressed Sensing Based Data Aggregation: Follow- 
ing several celebrated works in signal processing fTOl, ifTTI . 
compressed sensing (CS) - the technique for finding sparse 
solutions to underdetermined linear systems - has been in- 
tensively studied. It suggests an easy way to acquire and 
reconstruct a signal given that it is sparse or compressible. 
Right after its development, CS was introduced into WSNs 
as a data aggregation technique tTT\, fT3\, fU], (15], (T6\, 
IITTJ . CS promises to deliver, with high probability, a full 
recovery of signals from far fewer samples than their original 
dimension, as long as the signals are sparse or compressible 
in some domain |18|. In fact, the encoding process does 
not rely on the data correlation structure and the sensor 
nodes are not supposed to be aware of the correlation fT2l, 
which directly translates to the model-less "compression" and 
the blind encoding. In addition, the in-network aggregation 
required by CS incurs very light computation load Iil6i . All 
these make CS aggregation very attractive. 

However, three main issues are still hampering the practical 
use of CS aggregation. Firstly, as the existing techniques make 
use of conventional signal/image compression domains (e.g., 
DCT domain) for sensory data, the need for regular WSN 
deployments persists (T3\. Secondly, even if one could fully 
recover the sensory data obtained by a regularly deployed 
WSN, there is no guarantee that these data can faithfully 

^Lossless aggregation based on lossy compression techniques (e.g., wavelet 
compression) may still sacrifice data fidelity. Therefore, lossless aggregation 
is so termed to emphasize its intention to preserve the field information, rather 
than only extracting a few features. 



represent the field under surveillance, as the size of a WSN is 
often insufficient to cover a field. Last but not least, designing 
energy efficient routing is highly nontrivial as it may involve 
the coherence between the sparse domain and the network 
topology lfT4l. lfT6l. IfTTI. 

B. Our Approach and Contributions 

We first acknowledge that it is an ill-posed problem to 
directly recover a surveyed field from CS aggregated (hence 
under-sampled) data. Our response is the Dual-lEvel Com- 
pressed Aggregation (DECA) framework. In essence, DECA 
recovers, at the first level, the sensory data obtained by the 
whole WSN from the CS aggregated data. Then at the second 
level, DECA recovers the field based on the outcome of 
the first level. This decomposition in CS recovery brings 
several great benefits, in response to the hampering issues 
we mentioned above. First of all, adequate CS techniques can 
be applied to individual levels to achieve the best recovery 
performance and to avoid the requirement for regular WSN 
deployments. Secondly, the field can be recovered from the 
CS aggregated data, even though the original sensory data are 
only random samples of the field. Finally, an energy efficient 
routing technique can be deployed without incurring too much 
complexity for in-network coordinations. 

In proposing our DECA framework, we are making the 
following main contributions: 

• We propose, for the first time, the concept of dual- 
level CS aggregation and field recovery, dedicated to 
WSNs that monitor smoothly distributed (both spatial and 
temporal) physical quantities. 

• We apply diffusion wavelets to the first-level recovery 
(from CS aggregated data to the original sensory data), 
and we propose novel diffusion operators to achieve the 
best recovery performance. These operators also allow 
temporal correlation to be naturally taken into account. 

• We apply matrix completion scheme to the second-level 
recovery (from sensory data to fields). We discover that 
the performance is as good as if the sensory data were 
directly collected, although they are actually recovered 
from CS aggregated data. 

• We show that, under the DECA framework, the in- 
network computation is extremely light for sensor nodes, 
and natural tree partitions of a WSN can lead to a 
significant energy saving for the overall data collection 
process. 

C Roadmap 

In the remaining of our paper, we first survey the related 
literature on CS aggregation in Sec. Ill] Then we present the 
basic principles concerning the building blocks of DECA in 
Sec. [Ill] We focus on the design of DECA in Sec. |IV] We 
evaluate the performance of DECA in terms of both data 
fidelity and energy efficiency in Sec. W\ Finally, we conclude 
our paper in Sec. VI 



II. Related Work 

Our discussions in this section only emphasize on CS aggre- 
gation and related applications of CS to networking issues. As 
we explained in Sec. |I-A| a lossy aggregation is meaningful 
only to burst events, while a lossless aggregation either re- 
quires model awareness or regular sensor deployments. Given 
the absence of parallels between DECA and these approaches 
(as DECA demands none of these prerequisites), we omit their 
discussions. 

A. CS Data Aggregation in WSNs 

Two of the earliest contributions in applying CS to WSN 
data collection are by Bajwa et al. |19| and Duarte et al. Il20l . 
However, while [191 only involves single-hop communications 
and is hence not really concerning data aggregation, [20| 
focuses on compressing temporally correlated data, while 
relying on existing protocols to take care of multi-hop com- 
munications. 

Quer et al. ifTSl investigated the CS aggregation perfor- 
mance along with routing costs in multi-hop WSNs. They 
concluded that the accuracy of CS recovery depends on routing 
paths. However, this is an artifact introduced by defining the 



sensing matrix (see Sec. III-A i of CS according to the routing 



paths. Moreover, as we mentioned in Sec. I-A2 I.13J requkes a 
regular WSN deployment on a grid of cells and a full coverage 
of the surveyed field, i.e., one sensor per cell. 

Lee et al. I.14J also targeted the CS aggregation issue. 
They aimed at identifying proper network partitions for energy 
efficient CS aggregation. The main conclusion drawn in |T4| is 
that, if CS aggregations are performed for individual partitions 
of a WSN, the sensing matrix has to take the characteris- 
tics of the sparse domain into account. This, unfortunately, 
contradicts the spirit of CS, i.e., sensing matrices can be 



random. As we will show in Sec. IV-D if a sparse domain 



is properly chosen, the signal energy most concentrates on 
the "low frequency" components. Therefore, simple sensing 
matrices (e.g., Bernoulli random matrix) still suffice for CS 
aggregation performed for individual partitions. 

Two later (independent) proposals |15| and lfT6ll gave more 
emphasis on routing efficiency in CS aggregation. lITSl proved 
that, if k random samples are aggregated from a WSN of n 
nodes, the throughput is ^ times higher. However, a more 
detailed investigation (involving an interference model and 
scheduling) in 1 16 1 showed that the plain CS aggregation used 
in Iil5il may have a throughput even lower than no aggregation 
at all (non-aggregation hereafter). [16] further proposed the 
so called hybrid CS aggregation; it achieves a throughput 
always better than non-aggregation. 1151 also evaluated the 
performance of CS recovery. However, only data sampled from 
one-dimension signals (or sampled in 2D but can be reduced 
to vectors) are treated. 

B. Other Applications of CS in Networking 

Applications of CS to other networking issues came earlier 
than CS aggregation in WSNs. These applications are mostly 
concerned with traffic measurements in the Internet. Coates 



et al. ETl exploited the performance correlations between 
overlapping paths and proposed to use CS to reduce the 
number of measurements. Their proposal of using diffusion 
wavelets to accommodate measures taken on an arbitrary 
network topology has motivated our first-level recovery in 
DECA. However, the design of diffusion operators are totally 
different due to distinct correlation structures in data. 

Leveraging on the low rank feature of the Internet traf- 
fic matrices (TMs), Zhang et al. |22| applied the matrix 
completion technique (the most up-to-date development in 
CS) to recover TMs from highly incomplete samples. They 
demonstrated that the matrix completion technique consis- 
tently outperforms other commonly used methods such as 
singular value decomposition (SVD). We make a different use 
of matrix completion in the second-level recovery of DECA, 
considering the monitored field as an "image" and hence a 
matrix. The low rankness of this matrix is obvious: as far as 
an image does not just contain noise, it always has a sparse 
or compressible representation under SVD. 

Applications of CS in wireless networks other than data 
aggregation also appeared recently. Charbiwala et al. ||23]| 
proposed to use CS as a kind of erasure coding strategy for 
forward error correction, aiming at improving the robustness 
for data transmission. Rallapalli et al. |24| looked at the local- 
ization problem in mobile networks. The rationale for applying 
CS techniques is twofold: the matrix of node coordinates can 
be well approximated by a low-rank matrix, and the mobility 
features of nodes are temporally correlated. 

III. Background and Rationales 

In this section, we introduce the principles of DECA's build- 
ing blocks. The first three topics, namely compressed sensing, 
diffusion wavelets, and matrix completion, are concerning data 
sampling and recovery procedures, and the last topic deals with 
efficient routing structure to support CS compression. 

A. Compressed Sensing (CS) 

The theory of CS is pioneered by Candes and Tao fW\, 
as well as Donoho ifTTl . and later developed by many others 
(e.g., fTTj). The theory asserts that one can recover a certain 
data set from far fewer samples, as long as the data set has a 
sparse representation in a domain, and the sampling process 
is largely incoherent with the basis that enables the sparse 
representation. 

Suppose an n-dimensional vector u is m-sparse under a 
proper domain spanned by ^ = [ipi, . . . ,xpn], where ipi 
represents a column vector of the basis, we have 



U = *W = ^ WjVi 



for 771 ^ n, 



(1) 



where w is called the sparse representation of u: it has only 
m -^ n non-zero entries. Then the CS theory suggests that, 
under certain conditions, instead of directly computing and 
collecting the compressed coefficients w, we may collect a 
shghtly longer vector v = $u, where $ = [<pi, . . . , (pn] is a 
kxn "sensing" matrix corresponding to the sampling process. 



Consequently, we can recover u from v with high probability 
by solving the following convex optimization problem 



mininiizc 

wGR" 

subject to 



El 



(2) 



V = $*w, 



and by letting u = \&w, with w being the optimal solution 
of (pi. We hereafter refer to the random sampling process 
V = ^u as CS coding, and the process of recovering u by 
solving pi as CS recovery. 

The incoherence ifTOl between the sensing matrix $ and 
the sparse basis ^ is crucial to the recovery performance. 
In practice, a $ with Gaussian or BernoulU entries largely 
abides by the incoherence condition for any "^ if the number 
of measurements satisfies k > O{mlogn) 1101 . The choices 
for ^ include Fourier basis, wavelet basis, DCT basis, etc., 
depending on the specific applications. What really makes 
CS attractive is that the sparse basis does not need to be 
known during the encoding process, which makes it extremely 
suitable for data aggregation in WSNs. 

If the vector u is not exactly sparse but only compressible or 
if the sampling comes with errors, the constraint in problem ^ 
needs to be replaced by 



|<l>*w- vl 



< 



(3) 



Now, the theory requires <I>^ to obey the so-called restricted 
isometry principle (RIP) in order to guarantee a successful 
recovery ifTSl . In practice, for any fixed ^I^, RIP holds with high 
probability if $ has i.i.d. entries from the normal distribution 
(pij ^ N{0, 1/fc) or from a symmetric Bernoulli distribution 



Prf 



^«j 



±l/y/k) = 1/2, and if k > O (mlog(n/m)). 



B. Diffusion Wavelets 

Although CS allows a flexible choice for sparse bases, 
most of the sparse bases work only for vectors sampled 
from ID signalp] In order to cope with vectors sampled on 
manifolds or graphs (e.g., data sensed by a WSN), diffusion 
wavelets are developed to generalize classic wavelets |25|. As 
opposed to dilating a "mother wavelet" by powers of two to 
generate a set of classic wavelet bases, the dyadic dilation that 
generates diffusion wavelets relies on a diffusion operator. 
Here diffusion is used as a smoothing and scaling tool to 
enable multiscale analysis on manifolds or graphs. 

Let us take an arbitrary graph G as an example to illustrate 
the idea. Suppose the weighted adjacency matrix of G is i7 = 
[oJij], where ujij is the weight of edge {i, j). Let A = [\ij] be 
the normalized Laplacian of G, the definition is given below: 



1 



\/s; 



' Z-^v 



otherwise. 



(4) 



It is well known that A characterizes the degree of correlations 
between function values taken at vertices of the graph G l;26l . 

^Even for 2D signals such as images, tliey are sampled in a ID manner 
(row by row or column by column) to adapt to sparsifying transformations 
(e.g., wavelet transform). 



Roughly speaking, each eigenvalue (and the corresponding 
eigenvector) represents the correlation under a certain scale. 
In order to decompose the signal sampled on a graph in a 
multiscale manner, one may consider partitioning the range 
space of A. The idea behind diffusion wavelets is to construct 
a diffusion operator O from A, such that they share the same 
eigenvectors whereas all eigenvalues of O are smaller than 1 . 
Consequently, recursively raising O to power 2 and applying a 
fixed threshold to remove the diminishing eigenvalues (hence 
the corresponding eigenvectors and the subspaces spanned by 
them) lead to a dilation of the null space but a shrinkage of 
the range space; this naturally produces space splitting. 

More specifically, O^ is computed at the j-th scale, 
eigenvalue decomposition is derived for it, and the resulting 
eigenvectors form a basis that (qualitatively) represents the 
correlation over neighborhood of radius 2^ hops on the graph. 
Denote the original range space of O hy Uq = M", it is 
split recursively: at the j-th level, f/j_i is split into two 
orthogonal subspaces: the scaling subspace Uj that is the range 
space of O^ , and the wavelet subspace Vj as the difference 
between Uj and C/j_i. Given a specified decomposition level 
7, the diffusion wavelet basis '5 is the concatenation of the 
orthonormal bases of Vi, . . . ,Vy and U^. Interested readers 
are referred to |25| for detailed exposition. We want to point 
out that different diffusion operators lead to different wavelet 
bases, therefore the art of our later design lies in the proper 
choice of an operator 

C. Matrix Completion (MC) 

As an extension to the classic CS techniques that work in 
vector spaces, similar results have been developed for matrices, 
under the name of matrix completion (MC) |27|. The assertion 
is similar: a matrix can be recovered from a small set of 
samples of its entries, as far as it is low rank and its singular 
vectors are reasonably spread across all coordinates. 

Given a low rank matrix M G IR"i^"2 ^^jj-jj j-^jjIj (^ ^jj^j ^jj 

observation of k entries of M through a sampling operator 
VniM) such that 



[^n(M)],, = 



M,, {i,j)eU,\U\^k 
otherwise, 



where 11 is a subset of A/'s entries. If the number of samples k 
and the matrix M satisfy certain conditions, we may recover 
with high probability the whole M from the k samples by 
solving the following nuclear-norm minimization problem: 

inin(ni ,712) 
i=l 

rn{X)^Vn{M). 



minimize 



subject to 



This typical convex optimization problem is amenable to 
efficient solution techniques. 

The condition in terms of fc is fc > 

C'(max(ni, n2)Clog^(iii^x(ni, n2))), where r is some 
constant ||27| . The conditions for M may have various 
representations, and they generally confine the sparsity of 



the singular vectors of M (or incoherence of M). Intuitively 
speaking, if some singular vectors of M are very sparse, 
random sampling may well miss those non-zero components 
and hence fail to preserve the structure of AI. Different 
representations of the incoherence condition can lead to slight 
changes to the condition for k (e.g., different constant t), and 
sometimes even a different recovering algorithm |27|. 

We propose to use MC for recovering an image M from 
incomplete pixel samples, which differs from its original 
intention. Compared with the classic CS technique (discussed 
in Sec. |III-A| l, MC differs in two ways. Firstly, MC has no 
coding procedure, apart from a random sampling. Secondly, 
MC does not require a sparse domain to be explicitly chosen 
for the recovery, as the domain is suggested by SVD of the 
matrix. While the first feature allows a WSN (whose size 
is much smaller than the dimension of M) to be randomly 
deployed for monitoring A/, the second feature makes the 
recovery independent of the targeted A/. 

D. Compressed Data Aggregation (CDA) 

Assume we are given a WSN of n nodes with each one 
acquiring a sample Ui, the sink is supposed to collect all 
data u ~ [ui,--- ,u,i]^. Without data aggregation, clearly, 
the most energy efficient routing strategy is a shortest path 
tree (SPT) rooted at the sink, where the nodes around the sink 
tend to carry heavy traffic load, as shown in Fig. l2ta). 






(a) Non-aggregation 



(b) Plain CS aggregation (c) Hybrid CS aggregation 



Fig. 2. Different aggregations on a tree. The red stars denote the sink. 
Numbers beside hnlcs represent the traffic load with underlines indicating the 
CS coded traffic. 



The recent developed CS theory suggests a way to relieve 
the bottleneck lfT2l . ifTSJI . Let us rewrite the random sampling 
in a column form v = ui0i + • • • +m„0„. For CS-based data 
aggregation, each node i first "expands" it own sensory data 
Ui to k coded items, which corresponds to UiCJ)i. These k data 
items are then sent along a data collection tree. Whenever 
more than one set of such data items converge at a node, 
elements with the same indices are summed up. The eventual 
outcome accomplishes the overall CS coding v = $u. The 
imposed identical flow (Fig. Elb)), on one hand, eliminates the 
conventional bottleneck; but on the other hand, it introduces 
additional traffic to the leaf nodes. 

As an improvement, we proposed hybrid CS aggregation in 
lfT6l that fully exploits the advantage of CS. In a nutshell, if 
the number of data items converged at a certain node is below 
k, no aggregation is performed. The CS aggregation starts to 
work only when k or more data items gather at a node. The 
idea is illustrated in Fig. l2lc). Each aggregation is equivalent 



to a partial CS coding v' = <I>'u, where ^' contains a subset 
of all columns of $. 

In our recent work [28 J , we also investigated the energy 
efficient configurations for the hybrid CS aggregation through 
joint routing and CS aggregation. We have proven the min- 
imum energy compressed data aggregation problem is in 
general NP-complete by showing the equivalence between an 
optimal tree with fc = 2 and the maximum leaf spanning tree 
(MLST) problem 1 291 . Then we designed an efficient greedy 
heuristic to obtain the near optimal configurations. The basic 
idea is to grow the "core" (the set of nodes that transmit k 
samples) iteratively in a greedy manner until no node needs 
to be added to the core; the remaining nodes simply transmit 
non-aggregated data samples. We omit the algorithm details; 
interested readers are referred to 1281 . 

Our DECA framework makes use of the hybrid CS aggre- 
gation for joint CS coding and routing. However, instead of 
only considering a single tree, we partition a WSN into several 
trees and treat each tree independently, as will be addressed 
in Sec. HV^ 

IV. DECA: Decomposed CS Aggregation and Field 
Recovery 

In this section, we first introduce our network model and 



formally define our problem in Sec. |IV-A| then we will present 
the two recovery levels and the aggregation mechanism for 
DECA in Sec. lW^ to Sec. ITV^ 

A. Network Model and Problem Definition 

We assume a WSN is deployed to monitor a 2D area. 
Without loss of generality, we assume this 2D area has a 
rectangular shape. We partition this area into an a x 6 grid 
of square cells; the size of a cell represents the sensing 
coverage of a node. Sensor nodes are randomly deployed 
with a coverage ratio p, i.e., a cell is covered by a node 
with probability p. We represent the WSN by a connected 
graph G{V, E), where the vertex set V corresponds to the 
nodes in the network, and the edge set E corresponds to the 
wireless links between nodes. One special node s e y is 
known as the sink; it collects data from the whole network. 
We denote by n the cardinality of V . Obviously, we have 
n — p{a X b). Let c : i? — >■ M.q be a cost assignment on E, 
with c{i,j) : {i,j) G E being the energy expense of sending 
one unit of data across link {i,j). 

We assume that all nodes are roughly time synchronized 
and the data collection proceeds in rounds. At the beginning 
of a round r, node i produces one sample m[, and the sink 
collects all information at the end. In order to avoid duplicated 
aggregations on the way towards the sink, we restrict the data 
aggregation on a tree rooted at the sink. Finally, we assume 
that the sink knows the locations of all nodes. We illustrate 
our assumptions in Fig. l3] 

During a certain time period represented by a finite in- 
dex set TZ for all the rounds within this period, the WSN 
produces a set of sensory data vectors {u''}reK, where 
u"" = [ui,U2,-'' ,'K^]'^ is the data vector produced during 
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Fig. 3. A WSN that monitors a rectangular area. Tlie area is partitioned into 
a grid of square cells, the pentagram indicates the sink, and only the links 
used by the data collection tree are shown. 



round r. During the same period, the field under surveillance 
is represented by {F'^^ren with F"^ being an a x 6 matrix that 
models the area at round r. Then Ff^ refers to the value of the 
monitored physical quantity at the i-th row and j-th column 
of the discretized area during round r. In order to reduce 
the energy consumption of the WSN, an in-network data 
aggregation is performed such that the sink collects {v''}rgK, 
which is a compressed version of {xf^ren- Now our questions 
are the following: 

Ql: How can we recover {F'^^ren from {v'"},.g7j? 
Q2: What is the tradeoff relationship between recovery fidelity 
and energy consumption? 
We will answer Ql by presenting our DECA framework 
in the following, and we address Q2 when we evaluate the 
performance of DECA in Sec. W\ The general idea of DECA is 
a decomposition between CS aggregation and the recovery of 
the surveyed field. More specifically, we first recover {if^re'R 
from CS coded samples {v''}reK, then {F^^ren is further 
recovered from {vL^^ren- The classic CS technique is used for 
the first level, but we propose specific diffusion wavelets to act 
as the sparse basis. This allows an arbitrary network topology 
for data sampling, as well as virtually any tree partitions to 
reduce the traffic load. The second level recovery takes the 
outcome of the first level as noisy sampling data, and manages 
to recover the field using MC. It absorbs the errors resulting 
from the previous level, and thus achieves almost the same 
accuracy as if the sensor data were fully collected. 

B. First Level Recovery 

The general idea for this level is to apply CS coding to 
each u'' through in-network data aggregation. In other words, 
what the sink collects at the end of round r is v*" = f&u''. We 
will explain how this CS coding is applied on top of routing 



in Sec. IV-D Here we are only concerned with recovering 
{vJ'^ren from {v'^^ren- According to the discussion in 
Sec. |III-A we could recover individual u*" by solving an ii- 
minimization problem 



niiniinizc 
subject to 



El 



(6) 



to obtain the optimal solution w'' and by letting u'' — ^w''. 
The reason we use an error bound as the constraint is due 
to the fact that real world data are not strictly sparse but just 
compressible. As this problem is well known and can be solved 
by various techniques including, for example, ti-magic ll30ll 
and gradient projection for sparse reconstruction(GPSR) M3H . 
the art of our design lies in constructing \E'. 

1) Basis for Spatial Correlation: As u"" is sampled by a 
WSN with nodes randomly deployed, the basis '5 has to adapt 
to this irregularity, and hence diffusion wavelet basis is an 
ideal choice. According to Sec. 



III-B diffusion wavelets are 



generated by diffusion operator O, and O in turn comes from 



the weighted adjacency matrix il 



As uJij represents 



the correlation between the data sampled at nodes i and j, 
it should be a function of distance if we want to represent 
the spatial correlation. Let d{i,j) be the Euclidean distance 
between nodes i and j, we define 



w. 



»j 



d"{i,j) ij^j: 

/3 otherwise, 



(7) 



where a < and /3 is a small positive number. As a result, 
the normalized Laplacian becomes 



1- 



^i,J - 



E. 



_1 

, rf°(i,p) 



^/T.pd''{^.p)J:^d''{p,j) 



otherwise. 



(8) 



Here the constant /3 is used to tune the spectrum of the graph 
G, hence the structure of diffusion wavelets. 

Proposition 1: The eigenvalues of A lie between and 2, 
and the maximum eigenvalue (Tmax(A) is a decreasing function 
in /3. 
The proof is postponed to Appendix |A] 

Based on this proposition, two straightforward choices of 
the diffusion operator O are (/ is the identity matrix): 

O = I -A or = A/2; 

both have their eigenvalues ranged from to 1. Therefore, 
keeping raising a dyadic power to O will make all the 
eigenvalues diminish eventually. So we partition the range 
space of O and group the eigenvectors to form the basis, by 
thresholding on the diminishing eigenvalues of O^ . Based on 
the above construction procedure, we generate the diffusion 
wavelets for a WSN with 100 nodes and illustrate some of 
them in Fig. l4] 

2) Joint Spatial and Temporal Correlation: As the diffusion 
wavelet basis stems from a graph that represents the data corre- 
lation, we can extend the basis defined for spatial correlations 
to include temporal correlations as well. The idea is that, for 
a given time period indexed by TZ, we replicate the graph G 
by \R.\ times and also index them by TZ. Within each graph 
G"", the weighted adjacency matrix is still fi in (|7]i. Between 
two graphs G"^^ and G""^, the weight between node i in G""^ 
and node j in G""^ is given by 
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(a) Scaling function 



(b) Wavelet function I 




(c) Wavelet function II 



(d) Wavelet function III 



Fig. 4. Diffusion wavelets for a 100-node WSN. Although these wavelets 
are discrete, we present them as interpolated surfaces to facilitate visual 
illustration. 



where g() is an increasing function. This extended adjacency 
matrix is given as 



n 



ni, 



S(k2-ri|) 



S(|»'i-r2|) 

n 



9(ki-»'iTCi) 



ni, 



ni, 



CIL 



9(k|7i|-'-l|) ^"ff(k|TC|-r2|) 



g(k2-r|TC|) 



n 



where Ig{\rx-r2\) is ^ diagonal matrix with g{\ri — r2|) on 
its diagonal. The temporal correlation represented by Q, can 
be fine-tuned by g{-): the larger the first-order derivative of 
g{-), the faster the temporal correlation attenuates. Based on 
this extension, we can derive the diffusion operator and hence 
diffusion wavelets following exactly the same procedure as 
presented in Sec. |I V-B 1 1 

Intuitively, the benefit of involving spatial and temporal 
correlations together is twofold. Firstly, for large-scale WSNs 
with hundreds or thousands of nodes, the number of measure- 
ments k to be collected for each round could be reduced while 
still achieving the same level of recovery fidelity. Secondly, for 
small scale WSNs with only tens of nodes, CS often fails to 
deliver satisfactory recovery accuracy for individual rounds, 
which stems from the asymptotic nature of the CS theory. 
However, we could still apply CS aggregation but recover 
sensory data from a certain period (several consecutive rounds) 
as a whole. We will confirm these intuitions in Sec. IV-CI 

C. Second Level Recovery 



Based on our assumption in Sec. IV-A the field on top of 
the whole sensing area (which is only sparsely covered by a 
WSN) can be deemed as an "image" with each cell being a 



"pixel" of the image. Therefore, given the recovery results u^ 
from the first level, one could use the CS technique again: 
solve another li minimization problem (with u'" as the input) 
to recover the field. As this time the data items are well aligned 
into a 2D grid, ^ can be any basis for image compression, 
including DCT or wavelets. 

However, it is well known that SVD is optimal in decom- 
posing a matrix into separable (additive) components, as it 
leads to the minimum number of coefficients. Mathematically, 
for a matrix X, we have X — X]i='i (^iAi, where (Ji is the 
i-th singular value and Ai is a rank-1 matrix given by the outer 
product of the i-th left and right singular vectors. Therefore, a 
better choice is to use {At] as a counterpart of ^ for "matrix 
CS". This indeed corresponds to the idea of matrix completion 
discussed in Sec. |III-C| Unfortunately, we cannot directly solve 
the nu clear-no rm minimization problem dSll, because in general 
u'" 7^ Vii{F'^)^ where 11 refers to the location of WSN nodes 



in the matrix, Vn is the operator defined in Sec. III-C and 
• transforms a matrix to a vector indexed by node IDs. 
As the original sensory data u*" is often not sparse but only 
compressible, u'" is only an approximation of u'' in bo th li 
and (.2 norms ifTSJI . Therefore, we have u'' = Vji{F^) + $,, 
with ^ being the error term resulting from the previous level, 
and the current recovery relies on solving another optimization 
problem 



niinim.ize 
subject to 



\n--Vn{X)\U,<5. 



(9) 



Since the error bound 6 comes from the assumption that 
11^11^2 ^ '^' it depends on the parameters of the first level 
recovery (e.g., e and m). According to the theory of stable 
matrix completion EtI . the optimal solution of this problem 
approximates F*" in I2 norm. 

As a summary of the joint effect of the dual-level recovery, 
we have the following result. 

Proposition 2: \in> C'(niax(a, 6)^log^(niax(a, 6))) (sec- 
ond level) and k > O (mlog(n/m)) (first level), the optimal 
solution of (|9|, using the optimal solution of (l6| as input, 
approximates F^ in £2 norm: 



IX-F'' 




{2ab + n) min(a, h) 



2 5. 



This is proven by combining the results stated in fTsl and 
||27l, as shown in Appendix IE] We will show in Sec. W\ 
that, in practice, k/{ab) is below 5%. This means that, to 
recover a field represented by a x fe samples, only less than 
5% measurements need to be collected from the monitoring 
WSN. 

D. Efficient Data Routing 

The routing design for DECA is on top of the hybrid 
CS aggregation scheme introduced in Sec. |III-D| As we 
demonstrated in ||28l , better energy efficiency can be achieved 
with decreasing of k. However, applying CS aggregation on 
the whole network with a small k might not suggest an 



acceptable recovery. Thanks to the large redundancy in sensory 
data, intuitively, we can expect its spatially-localized subset is 
still sparse in a proper domain. Then for large-scale WSNs, 
we seek to further cut down the energy cost by partitioning 
the network into several subnetworks and carrying out CS 
coding (with a small k) independently within each part. At 
the decoding end, we take the joint reconstruction |14| to 
recover the data. Formally, the entire aggregation structure is 
a set T of disjoint data aggregation trees, all rooted at distinct 
one-hop neighbors of the sink and each tree % E T has an 



< n,ki < k with ^ 



%£T' 



^'Et- 



er ' 



k. By 



tuning ki, we hope to strike a balance between the energy 
efficiency and recovery performance. Note that each % is 
constructed to be nearly optimal using the greedy algorithm 
presented in |28|. However, the partition causes the sensing 
matrix $ to have a block-diagonal shape: 



$ 



$1 

$2 











$ 



\T\ 



where $j is a hi x rii matrix with random entries as specified 



in Sec. III-A Now the question is whether this $ satisfies RIP 



(which requires $ to be full; see Sec. III-A i. Fortunately, we 
have the following result: 

Proposition 3: For a given signal u — ^w with ||w||£„ = 
TO, and a partition scheme as stated above, u can be recovered 
exactly with high probability from the random samples v = 
$u by solving (|6]l if the number of samples satisfies 

k=^0{m\T\log^n). 

The proof is based on the results provided in 1 17| {Proposition 
3.3 and Theorem 3.4). Readers are referred to Appendix [C] for 
a sketch. 

V. Performance Evaluations 

In this section, we evaluate the performance of DECA with 
respect to recovery accuracy and energy efficiency, based on a 
large number of experiments using both synthetic data and real 
data sets. Moreover, we also address Q2 stated in Sec. |IV-A[ 
we show that DECA allows a WSN user to fine-tune the 
tradeoff between recovery accuracy and energy efficiency. 

A. Experiment Settings 

Existing online data sets are often collected by small-scale 
WSNs (e.g., EPFL SensorScope |32| and Intel Labs Berkeley 
WSN data |33l)- To also mimic widespread fields monitored 
by large-scale WSNs, we come up with three different ways 
to generate the data sets for our experiments. 

1) Peak: A synthetic data set generated by peaks function 
in Matlab. 

2) Intel: Real data sets obtained by a WSN deployed at 
Intel Labs Berkeley |33|. 

3) Temp: Temperature distribution in USA retrieved from 
|http://www.weather.gov[ 



For the first and the last data sets, we take a subset spread 
on a square area with 100 x 100 cells. The (field) value within 
each cell is set to be constant. In order to monitor such a field, 
we deploy a WSN on it by randomly putting nodes in cells 
with a coverage rate p, so the network size is n — p- 10''. We 
fix p for the first data set, but we will vary it for the third data 
set. The in-network CS aggregation is performed in two ways: 
it either routes data through a single tree with sample size fc, 
or through four disjoint trees of equal size, with ki — fc/4 or 
fc/3 for each tree. 

For the first level recovery of the multi-tree CS aggregated 
data, we either apply the diffusion wavelet basis for the whole 
WSN to directly recover the sensory vector u, or we con- 
duct CS recoveries for individual trees using their respective 
diffusion wavelet bases. We call these two mechanisms joint 
recovery (JR) and independent recovery (IR), respectively. We 
use i'l-magic |i30l and FPC f34l to solve the minimization 
problems for the first and second level recoveries. The perfor- 
mance of recovery accuracy is measured by the recovery error, 
defined as the normalized mean square error in the following: 



u|k2 



u 



\X ~X\ 
11X11, 



(10) 



They are defined for vector recovery (first level) and matrix 
recovery (second level), respectively. For energy efficiency 
evaluation, we set the single-hop transmission cost c{i,j) 
to be proportional to the cubic of the distance between the 
communicating pair i and j. 

During our experiments, we have tested upon numbers of 
diffusion operators, by varying a and (3 in the Laplacian ([S]) 
and by setting = / — AorO = A/2. According to our 
observations, O — I — A performs much better than O = A/2 
as the sparse basis, while the parameter tuple a G [—1,-1/3] 
and /3 S [0, 2] suggests the sparsest representation for given 
sensory data (e.g.. Fig. |6]l. Therefore, we fix a = —1 and 
/? = 1 and use O = / — A in later experiments. 

Remark: We will not compare DECA with other mechanisms, 
as DECA is the only one that can handle field recovery based 
on incomplete data samples. 

B. Synthetic Field: Peak 

This field is generated by the peaks function in Matlab, 
whose 2D image is shown in Fig. ISja). A WSN is randomly 
deployed on the field, with the sink fixed at the center to collect 
data from the whole network. We illustrate the dual-level 
recovery process of DECA in Fig. 15] For later experiments, 
we run through the DECA process 10 times (with different 
random CS coding) on each of the 10 random deployments to 
conduct dual-level recovery, and we report the mean values of 
these 100 processes. 

In order to show that CS aggregation works well for tree- 
partitioned WSNs, we need to demonstrate that the sensory 
data u only has low frequency components when projected 
onto the diffusion wavelet basis (see Sec. IV-D i. In Fig. [6] we 
plot the diffusion wavelet coefficients for sensory data obtained 
by WSNs of different sizes. As these coefficients are sorted 
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Fig. 7. Comparisons based on synthetic data. Here ST/MT corresponds to CDA on single tree or multiple trees. Whereas k refers to the CS measurements 
in ST case, ki denotes the measurements used for each subtree. IR and JR are short for independent recovery and joint recovery, respectively. 
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Fig. 5. Illustration of the DECA process (in clockwise order). We have 
n = 2600 and ki = 120 for each subtree. 



in descending order by their frequencies, it is evident that 
sensory data contain mostly low frequency components of the 
diffusion wavelet basis. 

Now we fix the coverage ratio p = 0.18, so the WSN 
has 1800 nodes. We evaluate the tradeoff between recovery 
accuracy and aggregation cost, by tuning k (the number of 
measurements) for CS coding. The results are given in Fig. l7] 
while (a) and (b) show respectively the first and the second 
level recovery errors for different data aggregation schemes, 
(c) plots the energy costs incurred by these schemes. If CS 
coding is performed on a single tree, compared with non- 
aggregation, around k — 7%n CS measurements lead to 
lower than 20% final recovery error and 40% saving in energy 
cost. Of course, adding more measurements will continuously 
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Fig. 6. Diffusion wavelet coefficients for WSNs with different sizes. 



reduce the recovery errors at the cost of increasing the aggre- 
gation cost. 

To further reduce the energy consumption, we carry out a 
four-equal-tree partition and conduct CS coding independently 
within each subtree. If we take ki — k/A measurements for 
each subtree, compared with the single tree case, independent 
recovery leads to worse performance whereas joint recovery 
gives almost the same outcome. We may attribute this to the 
loss of correlation between adjacent partitions under inde- 
pendent recovery. In fact, this suggests that, using diffusion 
wavelet basis, a block diagonal sensing matrix is comparable 
with a full sensing matrix. Meanwhile, the energy consumption 
is almost halved. 

To improve the recovery performance, we set ki — k/3, 
and it performs better than the single tree case in terms of 
the recovery error, while still reducing at least 40% of the 
energy consumption. In summary, DECA, especially with its 
tree partitioning CS aggregation, achieves very high energy 
efficiency while preserving the fidelity of data recovery. 

C. Real Data from An Actual WSN 

Now we proceed to analyze DECA on the real sensory data 
collected by Intel Labs Berkeley Ii33j . This WSN consisted 
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Fig. 9. Comparisons based on USA temperature field, where pi = 0.14, p2 = 0.18, ps = 0.22, and p4 = 0.26. In all the MT cases, ki = fc/3 for 
each subtree. In (b), DR plottings indicate the direct recoveries from the sensory data, which serve as baselines for the 2nd level performance. As for (c), the 
baseline is the aggregation cost of a 100 X 100 grid network that fully covers the "image". 



of 54 sensors. With such a small scale, it might not be 
worth applying CS aggregation, as the required number of 
CS measurements could be comparable to the network size n. 
However, what we want to show here is that, as DECA allows 
jointly recovering several consecutive snapshots {u^Jreiz by 
leveraging on the temporal correlation, CS aggregation can 
still be useful for small-scale WSNs. 

To jointly consider spatial and temporal correlations in data, 
we make use of the diffusion operator proposed in Sec. |IV-B2| 
to generate diffusion wavelet basis. We take 10 consecutive 
(in time) snapshots, with the time interval between two sets 
of readings equal to 10 or 30 minutes. As the further two 
snapshots are away from each other in time, the less likely 
they are correlated. Therefore, when constructing the diffusion 
operators, we set g(-) — exp(0.5 * (|ri ~r2\ + 1)) for the case 
with 10 minutes interval and g{-) — exp(|ri — r2| + 1) for 
that with 30 minutes interval. Setting k — 10, we compare 
the joint spatial and temporal recovery with the independent 
spatial recovery in Fig. [8] 

Note that though each snapshot u'' is spatially and irregu- 
larly distributed, we deliberately sort them according to their 
indices in each snapshot. As a result, the data appear to exhibit 
certain periodicity, which indeed indicates the existence of 
temporal correlation. It is evident from Fig.|8]that, whereas the 
individual spatial recovery does not deliver any meaningful re- 
covery of the sensory data, DECA's joint spatial and temporal 
recovery always give excellent results. This is the case even 



when the sensory data appear to be non- stationary, as shown 
in Fig. igc). 

Remark: We cannot evaluate the energy efficiency for this case, 
as we do not have the access to the original network topology. 
However, with n = 54 and k — 10, CS aggregation is bounded 
to save energy compared with non-aggregation. 

D. Real Field: Temperature Distribution 

In this section, we validate the effectiveness of DECA over 
a set of temperature distribution data provided by NOAA (http: 
|//www.noaa.gov/| . The NOAA datasets have been widely used 
by the WSN research community, e.g., JS), ifTSJI . as they are 
considered as an analogy to the sensory data. The field that 
we take as an example is shown in Fig. [TO|a), and we give 



one example of the final recovering result in Fig. 10 b), which 



accurately captures the features of the original field. Differing 
from the evaluations reported in Sec. |V-B in this case we also 
vaiy the WSN size by setting p e {0.14, 0.18, 0.22, 0.26}. The 
performance comparison with respect to different settings are 
plotted in Fig. |9] 

From Fig. l9|a) and (b), we can observe that, as k increases, 
though the first level errors keep decreasing (no matter what 
value is taken for p), the second level errors become somewhat 
saturated. According to our experiments with matrix comple- 
tion directly from the sensory data, the second level recovery 
errors are actually approaching such limits. Therefore, we 
have shown that, DECA not only enables field recovery from 
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Fig. 10. Illustration of the temperature field recovery using DECA. We have 
n = 2200, four subtrees with ki = 160. 



incomplete sensory data (due to sparse coverage of a WSN), 
but it also allows energy saving by performing CS aggregation 
in the WSN. Specifically, to recover a field represented by 
10,000 samples, only 480 measurements (< 5%) need to be 
collected from the WSN. 

VI. Conclusion 

Leveraging on the recent developments in compressed sens- 
ing and harmonic analysis, we have proposed in this paper 
the Dual-LEvel Compressed Aggregation (DECA) framework 
to recover a field (of certain physical quantities) surveyed by 
a WSN. Although WSNs have long been deemed as powerful 
tools to monitor fields, DECA framework is novel because 
we are the first to tackle the issue of recovering fields from 
the aggregated version of the sensory data that are already 
incomplete, whereas existing proposals are mostly concerned 
with recovering sensory data from aggregated measurements. 

We achieve our goal by developing a novel combination of 
classic compressed sensing technique with matrix completion, 
which allows us to "suit the medicine to the illness" by 
tackling two problems with dedicated tools. Specifically, we 
use diffusion wavelet based compressed sensing to recover 
sensory data at the first level, then we apply matrix com- 
pletion to recover a field at the second level. Our perfor- 
mance evaluations with intensive experiments have shown 
that DECA can achieve high recovery accuracy while still 
reducing energy consumption compared with traditional data 
collection schemes. In addition, DECA allows a WSN user to 
fine-tune the tradeoff between recovery accuracy and energy 
efficiency. Finally, by jointly exploiting spatial and temporal 
correlations in sensory data, DECA is applicable even to small- 
scale WSNs. 

We are on the way to refining the DECA framework, aiming 
at tuning the parameters to further improve the recovery accu- 
racy. Also we intend to make use of the recent model-based CS 
to further cut down the number of measurements and hence the 
energy cost. Moreover, we are interested in extending DECA 
to 3D field and hence the 3D WSNs monitoring such fields. 
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Appendix A 
Proof of Proposition[T] 

Due to the connectivity of the communication graph, com- 
bining Lemma 1.7 (iv) and (v) in ||26| we know that all 
eigenvalues of A lie between and 2. 

The largest eigenvalue can be represented as 

,., Ej:0-,,)eij(/(i) -/(*))' 
a_(A)=sup ^-j^^ , 

where /(•) is an arbitrary real function assigned to each vertex. 
Therefore, crmax(A) is a decreasing function in f3. Q.E.D. 

Appendix B 
Proof of Proposition|2] 

In our case, we have n observed entries out of a x 6 samples. 
As suggested by (111.3) in [27 1, we have 




^X_pru^,.lil±^hMaM^Ag^ 



Then Proposition [2] follows by simply plugging q = n/ab, 
which indicates the fraction of observed entries. Q.E.D. 

Appendix C 
Proof of Proposition[3] 
Here we take the results from [17 1: 

Theorem 3.4: For a given signal u = ^w with 
||w||^P = m and a clustering (permutation) scheme 
with parameter ^ e [0,1], the £i optimizer can 
recover u exactly with high probability if the number 
of measurements k = 0{miint log n) where nt is 
the number of clusters. 
The parameter ^ is defined to be the maximum energy overlap 
between sensing matrix and sparse basis. Mathematically, 






with /* = 1 indicating V'^j overlaps with cluster t and 
otherwise /* = 0. In our case, we generate the sparse 
basis from diffusion wavelets, and we take the upper bound 
/.t = 1. If the network is partitioned into |T| subtrees, we 
need k — 0{m\T\\o^ n) random samples to guarantee the 
recovery performance. Q.E.D. 



